Skip to main content

Speed up hive queries with databend

November 28, 2022 · 2 min read

Backgroup

now databend support hive catalog to run hive queries, this docs shows how to set up databend-hive enviroment and run hive sqls.

How to set up databend-hive cluster

hiveserver&metastore&hdfs is supposed to be pre-installed.

download a databend-release with hive support, or build from source

## make sure JAVA_HOME is set
export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
cargo build --features hive,storage-hdfs

setup a databend cluster, refer to deploying-databend
add hive catalog and hdfs storage to databend-query.toml

[storage]
type = "hdfs"

[storage.hdfs]
# hdfs namenode address,such as 127.0.0.1:8020
name_node = "xx"
root = ""

[catalogs.hive]
type = "hive"
# hive metastore address, such as 127.0.0.1:9083
address = "xx"

run databend-query with java&hadoop enviroment

export HADOOP_HOME=xxx
export JAVA_HOME=xxx, such as /Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH

./bin/databend-query -c ./databend-query.toml > query.log 2>&1 &

setup hive related settings with mysql client

set global sql_dialect = 'hive';

suggest settings:

-- for chinese users
set global timezone = 'Asia/Shanghai';
set global max_execute_time = 180000;

-- support hive nvl function
create FUNCTION nvl as (a,b) -> ifnull(a,b);

query hive data using mysql client or mysql jdbc client. Note: hive tables must be reffered as hive.db.table

select * from hive.$db.$table limit 10;

Limititions

only support parquet table, not support orc,txt
not support struct&map&decimal hive data types
only support hive select queries, not support DDL, insert, DML sqls
not support hive udfs, hive functions are limited supported

hive features is now in beta stage, please feel free to report bugs&suggestions in databend issues.

Backgroup
How to set up databend-hive cluster
Limititions